Information Term Selection for Automatic Query Expansion

نویسندگان

  • Claudio Carpineto
  • Renato De Mori
  • Giovanni Romano
چکیده

Techniques for query expansion from top retrieved documents have been recently used by many groups at TREC, often on a purely empirical ground. In this paper we present a novel method for ranking and weighting expansion terms. The method is based on the concept of relative entropy, or Kullback-Lieber distance, developed in Information Theory, from which we derive a computationally simple and theoretically justified formula to assign scores to candidate expansion terms. This method has been incorporated into a comprehensive prototype ranking system, tested in the ad hoc track of TREC-7. The system’s overall performance was comparable to median performance of TREC-7 participants, wich is quite good considering that we are new to TREC and that we used unsophisticated indexing and weighting techniques. More focused experiments showed that the use of an information-theoretic component for query expansion significantly improved mean retrieval effectiveness over unexpanded query, yielding performance gains as high as 14% (for non interpolated average precision), while a per-query analysis suggested that queries that are neither too difficult nor too easy can be more easily improved upon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Global Term Expansion Methods for Text Retrieval

This paper describes our work at the fifth NTCIR workshop on the subtasks of single language information retrieval (SLIR). Several automatic global query expansion strategies were explored based on a machine-derive thesaurus. These term selection strategies were compared with manual selection and local expansion. Experiments show that all the global expansion strategies perform worse than the s...

متن کامل

QEA: A New Systematic and Comprehensive Classification of Query Expansion Approaches

A major problem in information retrieval is the difficulty to define the information needs of user and on the other hand, when user offers your query there is a vast amount of information to retrieval. Different methods , therefore, have been suggested for query expansion which concerned with reconfiguring of query by increasing efficiency and improving the criterion accuracy in the information...

متن کامل

Evolving Term-Selection Schemes for Pseudo-Relevance Feedback in Information Retrieval

Automatic query expansion in Information Retrieval aims to improve retrieval performance by overcoming the problem of term mismatch between a query and its relevant documents. Pseudorelevance (blind) feedback techniques have been shown to be of benefit on large TREC collections in recent years. This technique analyses terms in the top few documents deemed relevant by the system, reformulates th...

متن کامل

Query Expansion Using Term Distribution and Term Association

Good term selection is an important issue for an automatic query expansion (AQE) technique. AQE techniques that select expansion terms from the target corpus usually do so in one of two ways. Distribution based term selection compares the distribution of a term in the (pseudo) relevant documents with that in the whole corpus / random distribution. Two well-known distribution-based methods are b...

متن کامل

Informative term selection for automatic query expansion

Techniques for query expansion from top retrieved documents have been recently used by many groups at TREC, often on a purely empirical ground. In this paper we present a novel method for ranking and weighting expansion terms. The method is based on the concept of relative entropy, or Kullback-Lieber distance, developed in Information Theory, from which we derive a computationally simple and th...

متن کامل

Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach

Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998